AITopics

2508.1715

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceAug-15-2025

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

Shu, Huizhen, Li, Xuying, Wang, Qirui, Kosuga, Yuji, Tian, Mengqiu, Li, Zhuo

With the rapid proliferation of Natural Language Processing (NLP), especially Large Language Models (LLMs), generating adversarial examples to jailbreak LLMs remains a key challenge for understanding model vulnerabilities and improving robustness. In this context, we propose a new black-box attack method that leverages the interpretability of large models. We introduce the Sparse Feature Perturbation Framework (SFPF), a novel approach for adversarial text generation that utilizes sparse autoencoders to identify and manipulate critical features in text. After using the SAE model to reconstruct hidden layer representations, we perform feature clustering on the successfully attacked texts to identify features with higher activations. These highly activated features are then perturbed to generate new adversarial texts. This selective perturbation preserves the malicious intent while amplifying safety signals, thereby increasing their potential to evade existing defenses. Our method enables a new red-teaming strategy that balances adversarial effectiveness with safety alignment. Experimental results demonstrate that adversarial texts generated by SFPF can bypass state-of-the-art defense mechanisms, revealing persistent vulnerabilities in current NLP systems.However, the method's effectiveness varies across prompts and layers, and its generalizability to other architectures and larger models remains to be validated.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2508.10404

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceAug-6-2024

Deep Clustering via Distribution Learning

Dong, Guanfang, Tan, Zijie, Zhao, Chenqiu, Basu, Anup

Distribution learning finds probability density functions from a set of data samples, whereas clustering aims to group similar data points to form clusters. Although there are deep clustering methods that employ distribution learning methods, past work still lacks theoretical analysis regarding the relationship between clustering and distribution learning. Thus, in this work, we provide a theoretical analysis to guide the optimization of clustering via distribution learning. To achieve better results, we embed deep clustering guided by a theoretical analysis. Furthermore, the distribution learning method cannot always be directly applied to data. To overcome this issue, we introduce a clustering-oriented distribution learning method called Monte-Carlo Marginalization for Clustering. We integrate Monte-Carlo Marginalization for Clustering into Deep Clustering, resulting in Deep Clustering via Distribution Learning (DCDL). Eventually, the proposed DCDL achieves promising results compared to state-of-the-art methods on popular datasets. Considering a clustering task, the new distribution learning method outperforms previous methods as well.

algorithm, clustering, distribution learning, (11 more...)

2408.03407

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)
North America > Cuba > La Habana Province > Havana (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.89)

arXiv.org Machine LearningApr-11-2024

Quality check of a sample partition using multinomial distribution

Modak, Soumita

In this paper, we advocate a novel measure for the purpose of checking the quality of a cluster partition for a sample into several distinct classes, and thus, determine the unknown value for the true number of clusters prevailing the provided set of data. Our objective leads us to the development of an approach through applying the multinomial distribution to the distances of data members, clustered in a group, from their respective cluster representatives. This procedure is carried out independently for each of the clusters, and the concerned statistics are combined together to design our targeted measure. Individual clusters separately possess the category-wise probabilities which correspond to different positions of its members in the cluster with respect to a typical member, in the form of cluster-centroid, medoid or mode, referred to as the corresponding cluster representative. Our method is robust in the sense that it is distribution-free, since this is devised irrespective of the parent distribution of the underlying sample. It fulfills one of the rare coveted qualities, present in the existing cluster accuracy measures, of having the capability to investigate whether the assigned sample owns any inherent clusters other than a single group of all members or not. Our measure's simple concept, easy algorithm, fast runtime, good performance, and wide usefulness, demonstrated through extensive simulation and diverse case-studies, make it appealing.

algorithm, case study, partition, (15 more...)

2404.07778

Country:

Asia > India > West Bengal > Kolkata (0.14)
North America > United States > New Jersey (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Sun, Xiaoxiao, Sajda, Paul

Circular Clustering with Polar Coordinate Reconstruction

arXiv.org Artificial IntelligenceSep-15-2023

There is a growing interest in characterizing circular data found in biological systems. Such data are wide ranging and varied, from signal phase in neural recordings to nucleotide sequences in round genomes. Traditional clustering algorithms are often inadequate due to their limited ability to distinguish differences in the periodic component. Current clustering schemes that work in a polar coordinate system have limitations, such as being only angle-focused or lacking generality. To overcome these limitations, we propose a new analysis framework that utilizes projections onto a cylindrical coordinate system to better represent objects in a polar coordinate system. Using the mathematical properties of circular data, we show our approach always finds the correct clustering result within the reconstructed dataset, given sufficient periodic repetitions of the data. Our approach is generally applicable and adaptable and can be incorporated into most state-of-the-art clustering algorithms. We demonstrate on synthetic and real data that our method generates more appropriate and consistent clustering results compared to standard methods. In summary, our proposed analysis framework overcomes the limitations of existing polar coordinate-based clustering methods and provides a more accurate and efficient way to cluster circular data.

algorithm, dbscan, polar coordinate, (16 more...)

2309.08757

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > South Carolina (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJul-29-2023

Clustering with Neural Network and Index

Liu, Gangli

A new model called Clustering with Neural Network and Index (CNNI) is introduced. CNNI uses a Neural Network to cluster data points. Training of the Neural Network mimics supervised learning, with an internal clustering evaluation index acting as the loss function. An experiment is conducted to test the feasibility of the new model, and compared with results of other clustering models like K-means and Gaussian Mixture Model (GMM). The result shows CNNI can work properly for clustering data; CNNI equipped with MMJ-SC, achieves the first parametric (inductive) clustering model that can deal with non-convex shaped (non-flat geometry) data.

artificial intelligence, machine learning, neural network, (17 more...)

2212.03853

Country:

Asia > China > Beijing > Beijing (0.06)
Oceania > New Zealand (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Beg, A. H., Islam, Md Zahidul, Estivill-Castro, Vladimir

Tree Index: A New Cluster Evaluation Technique

arXiv.org Machine LearningMar-24-2020

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. Our Tree Index produces a decision tree from the clustered data set, using the cluster identifiers as labels. It combines the entropy of each leaf with their depth. Intuitively, a shorter tree with pure leaves generalizes the data well (the clusters are easy to learn because they are well separated). So, the labels are meaningful clusters. If the clustering algorithm does not separate well, trees learned from their results will be large and too detailed. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters. We confirm the effectiveness of Tree Index through graphical visualizations. Tree Index evaluates the sensible solutions higher than the non-sensible solutions while existing cluster-quality indexes fail to do so.

clustering result, evaluation value, tree index, (15 more...)

2003.10841

Country:

Oceania > Australia (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Saha, Jayasree, Mukherjee, Jayanta

CNAK : Cluster Number Assisted K-means

arXiv.org Machine LearningNov-20-2019

Determining the number of clusters present in a dataset is an important problem in cluster analysis. Conventional clustering techniques generally assume this parameter to be provided up front. In this paper, we propose a method which analyzes cluster stability for predicting the cluster number. Under the same computational framework, the technique also finds representatives of the clusters. The method is apt for handling big data, as we design the algorithm using Monte-Carlo simulation. Also, we explore a few pertinent issues found to be of also clustering. Experiments reveal that the proposed method is capable of identifying a single cluster. It is robust in handling high dimensional dataset and performs reasonably well over datasets having cluster imbalance. Moreover, it can indicate cluster hierarchy, if present. Overall we have observed significant improvement in speed and quality for predicting cluster numbers as well as the composition of clusters in a large dataset. Keywords: k-means clustering, Bipartite graph, Perfect Matching, Kuhn-Munkres Algorithm, Monte Carlo simulation. 1. Introduction In cluster analysis, it is required to group a set of data points in a multidimensional space, so that data points in the same group are more similar to each other than to those in other groups. These groups are called clusters. Various distance functions may be used to compute the degree of similarity or dissimilarity among these data points. Typically Euclidean distance function is widely used in clustering. The aim of this unsupervised technique is to increase homogeneity in a group and heterogeneity between groups. Several clustering methods with different characteristics have been proposed for different purposes. Some well-known methods include partition-based clustering [26], hierarchical clustering [25], spectral clustering [27], density-based clustering [12]. However, they require the knowledge of cluster number for a given dataset a priori [12, 21, 26, 27, 36].

cluster number, dataset, simulation, (13 more...)

1911.08871

Country:

Europe > Finland > North Karelia > Joensuu (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > West Bengal > Kharagpur (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ahmad, Amir, Khan, Shehroz S.

A Novel Initial Clusters Generation Method for K-means-based Clustering Algorithms for Mixed Datasets

arXiv.org Machine LearningJan-31-2019

Mixed datasets consist of numeric and categorical attributes. Various K-means-based clustering algorithms have been developed to cluster these datasets. Generally, these clustering algorithms use random initial clusters which in turn produce different clustering results in different runs. A few cluster initialisation methods have been developed to compute initial clusters, however, they are either computationally expensive or they do not create the same clustering results in different runs. In this paper, we propose a novel approach to find initial clusters for K-means-based clustering algorithms for mixed datasets. The proposed approach is based on the observation that some data points in datasets remain in the same clusters created by K-means-based clustering algorithm irrespective of the choice of initial clusters. It is proposed that individual attribute information can be used to create initial clusters. A K-means-based clustering algorithm is run many times, in each run one of the attributes is used to create initial clusters. The clustering results of various runs are combined to produce a clustering result. This clustering result is used as initial clusters for a K-means-based clustering algorithm. Experiments with various categorical and mixed datasets showed that the proposed clustering approach produced accurate and consistent results.

algorithm, dataset, initial cluster method, (13 more...)

doi: 10.13140/RG.2.2.21979.62244

1902.00127

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Ireland (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningOct-8-2018

Hierarchical clustering that takes advantage of both density-peak and density-connectivity

Zhu, Ye, Ting, Kai Ming, Jin, Yuan, Angelova, Maia

This paper focuses on density-based clustering, particularly the Density Peak (DP) algorithm and the one based on density-connectivity DBSCAN; and proposes a new method which takes advantage of the individual strengths of these two methods to yield a density-based hierarchical clustering algorithm. Our investigation begins with formally defining the types of clusters DP and DBSCAN are designed to detect; and then identifies the kinds of distributions that DP and DBSCAN individually fail to detect all clusters in a dataset. These identified weaknesses inspire us to formally define a new kind of clusters and propose a new method called DC-HDP to overcome these weaknesses to identify clusters with arbitrary shapes and varied densities. In addition, the new method produces a richer clustering result in terms of hierarchy or dendrogram for better cluster structures understanding. Our empirical evaluation results show that DC-HDP produces the best clustering results on 14 datasets in comparison with 7 state-of-the-art clustering algorithms.

artificial intelligence, dataset, machine learning, (19 more...)

1810.03393

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)